1 General information

1.1 Welcome Email

Dear students,

A warm welcome to the module Data skills for social work professionals!

I would like to give you some important information on the course:

  1. Physical presence in the course is not mandatory except for the last day (Jan 17), but I strongly urge you to participate in the course during the first four days. For those who do work part-time: please schedule your work accordingly.

  2. I it is imperative that you have first experiences with R and RStudio and make sure it runs on your computer. Please follow the instructions in the “Installation of R and R-Studio” guide (https://drive.switch.ch/index.php/s/ktNsnWxwkJ3olWG), and if necessary, refer to the linked instructions on YouTube. If you have any questions, please feel free to contact us via email. Please use Copilot with the prompt below to guide you through the installation, explain the software to you in easy language and show you what you can do with it.

  3. Enroll on the moodle page (Kurs: Data skills for social work professionals (in English) - HS24 | BFH Moodle – Die Lernplattform der Berner Fachhochschule) with the following key: HS24-bsc. At least a week before the course you will find a link to the script of the course as well as the relevant literature that you need to prepare and other relevant information. We wish you a successful preparation period and look forward to meeting you in person soon. Please let us know should you have any questions.

Kind regards

Dorian Kessler


Text to enter into Co-Pilot ein (Microsoft Copilot in Bing; important: verwenden Sie den Unterhaltungsstil «im höheren Masse kreativ/creative mode» (Schaltfläche in der Mitte des Bildschirms)): Als Studierende(r) der Sozialen Arbeit möchte ich die Grundlagen der Programmiersprache R lernen, um statistische Datenanalysen für Projekte in der Sozialen Arbeit durchführen zu können. Ich habe keine Vorkenntnisse in Statistik oder Programmierung. Kannst du mir bitte eine schrittweise Einführung geben? Bitte beginne mit der Frage ob ich R und Rstudio installiert habe und wenn nein, unterstütze mich bei der Installation von R und RStudio. Zeige mir dann die grundlegenden Befehle und Funktionen von R. Ich würde ich gerne lernen, wie man einfache Datenanalysen durchführt (z.B. Mittelwertsvergleiche mit dplyr), Daten visualisiert (mit ggplot2) und Ergebnisse interpretiert. Folgende Dinge sind zu beachten:

  • Wähle ein schrittweises Vorgehen. Erzähle mir erst von dem nächsten Schritt, wenn ein Schritt abgeschlossen ist. Frage nach jedem Schritt nach, ob ich diesen erfolgreich abschliessen konnte, um sicherzustellen, dass ich alles richtig gemacht habe.

  • Sage mir als ersten Schritt genau wie ich mich visuell in RStudio orientieren kann und wo ich Eingaben machen muss. Wo befindet sich die Konsole/Skript/Datenübersicht/Dateienübersicht in RStudio?

  • Erkläre mir, was die Konsole ist und was ein R-Skript ist, wie man ein R-Skript erstellt und abspeichert und was der Zweck von Skripten ist. Arbeite mit mir mit einem R-Skript und sage mir, wie ich Befehle ausführen kann.

  • Bitte führe mich durch praktische Übungen und gebe mir Aufgaben, um das Gelernte zu festigen.

  • Biete mir Unterstützung bei Unklarheiten.

  • Arbeite mit Beispielen, welche für die Soziale Arbeit relevant sind. Erfinde relevante Daten aus den Bereichen Sozialhilfe oder Kindes- und Erwachsenenschutz.

  • Kommentiere den Code Zeile-für-Zeile detailliert aus, so dass ich ihn genau verstehe.

  • Biete mir am Schluss weitere Übungen an, falls ich Lust habe. Mache Vorschläge für Übungen.

  • Du bist eine R-Expert:in, weisst aber auch, dass angehende Sozialarbeiter:in in Sachen Programmierung wenig Wissen haben und das nicht technische Begriffe eine alltagssprachliche Erklärung benötigen.

  • Danke für deine motivierte Unterstützung und Hilfsbereitschaft! Du hilfst mir R zu lernen und dieses Wissen für Klient:innen einzusetzen.

  • Wichtige Details:

  • Bitte lasse das «print()» weg, falls nicht nötig.

  • Ergänze bei Strg jeweils Ctrl, falls gewisse Personen englische Windows Tastaturen haben.

2 General Introduction

2.1 Learning Goals

  • People gain awareness of data science tools and how they could be used for social work.

  • People learn how to critically evaluate data science products

  • People learn how to do data science with R.

2.2 What is data science?

  • Term that emerged ca. 10 years ago. Predecessors: Statistics, Data analysis.

  • The science of creating valuable information from data

  • Practice-oriented science

  • Combines technical and field expertise

2.3 Datafication or why data science is becoming more important in the future

  • Data is the new oil.

  • Data contains information on human behavior = helps us better understand the human world and solve human problems.

  • In the era of AI, “data literacy” becomes a key skill in all areas of life, including social work –> it should be a basic competence

    • Skills to interpret data

    • Awareness of data and knowing how to use them

    • Skills to analyze data

2.4 How can data science benefit social work?

2.5 Data sources that are relevant for social work

2.5.3 Found data

  • Data not explicitly generated for research
  • Always on
  • Numbers, text, images, audio, video
  • Data from
    • Online activity (digital communication etc.)
    • Smartphone usage (calling, filming, walking etc.)
    • Administrative registries
    • Payments
    • Smart devices
    • Video surveillance

  • Publicly owned individual data

  • Can be linked using social security numbers

The Swiss federation and cantons store data about all of life’s aspects
The Swiss federation and cantons store data about all of life’s aspects

2.6 Exercise

  • Develop an idea how data science could be used in social work based on the file “Use cases in social work” together with Copilot/ChatGPT
  • Use the following prompt:

You are ChatGPT, and your task is to help me develop a practical example of how data science could be applied in social work. The goal should be an example that is highly useful. Use the file from Dorian Kessler on potential use cases as a reference. Guide me through targeted questions to understand my work context or area of interest and suggest the most relevant application.

Conversation steps:

Understand the context:

Ask me:

  • “Are you currently working in social work? If not, what area interests you most?”

  • “Who are the clients or groups you work with or aim to work with?”

  • “What are common tasks in this field?”

  • “What are the three most pressing problems in your field?”

  • “What data is currently available or could be collected to improve workflows?”

Suggest solutions:

Based on my answers and Dorian Kessler’s file, propose 1–2 realistic examples of how data science could address challenges or improve processes. Briefly explain the benefits.

Get feedback and refine:

Ask:

  • “Does this idea seem relevant and practical for your field? How could it be adjusted to fit better?”

Refine the example with my input and help me select the best option to share with my peers.

  • Post your final ideas on this padlet

2.7 Course plan

3 Measuring the effects of social work

3.1 Why is it important to measure the effects of social work?

Improving practice with better knowledge

3.2 What is an effect and what not?

  • Effect = difference in the result with influencing variable versus without influencing variable (= counterfactual situation)

  • What is the counterfactual situation?

    • The fictional world in which the influencing variable was not present.
  • Exercise
    • Talk to the person sitting next to you.
      • What was the most important event in your life (family, education, work, health, social relationships)?
      • What areas of your life have been affected by this event?
      • What would these areas be like if the event had not happened (can you guess numbers)?
  • Example of effect measures in social work

3.3 How can we measure the effects of social work with quantitative data?

3.3.1 Asking experts

  • Asking individuals about the subjectively measured effect

  • Example: “On a scale from 0 to 10, how much does one daily glass of wine affect your health?”

  • Advantages

    • Easy to measure: one question

    • Subjective expertise: we know a lot about effects (e.g. pain killers)

  • Disadvantages

    • We are unaware of the counterfactual

    • Social desirability bias: we want to please the researcher

3.3.2 Assessing correlations

  • Is there a systematic relationship between two dimensions?

  • Example: wine consumption and dementia

  • Advantages

    • Easy to measure: few questions
      • Wine consumption
      • Dementia symptoms
  • Disadvantages

    • Often: correlation is not equal to causation
    • Why do frequent wine drinkers have less dementia?

3.3.3 Experiments - the gold standard

  • Advantages:

    • Secure statements on causality

    • Control over treatment

  • Disadvantages:

    • Ethical problems

    • High financial and administrative burden

    • Often limited generalizability

    • Low variance (often only two manifestations: treatment vs. no treatment)

    • Social desirability (except in double-blind studies with placebo)

3.3.4 Natural experiments

  • A random event/dimension (Z) influences independent variable (X) but not the outcome (Y)

3.3.5 Exercise

  • Form four groups: one for each method to measure effects

  • Imagine this: you want to find out how meetings with social workers affect client well-being

  • Please define a research design according to your method of effect measurement

    • What data would you analyze?

    • What numbers would you calculate to measure the effect?

4 Prediction and AI in social work

4.1 Why should we use machines to predict in social work?

  • Prediction is an integral part of individual-level social work

    • It is used for diagnosis

      • Identifying clients’ need for assistance

      • The future developments of clients’ outcomes without assistance is an integral part of diagnosis

      • Based on predictions of future outcomes, we decide which clients need our help most

    • It is also used for treatment

  • On an aggregate level, we need to predict future need for services to ensure mobilization of adequate resources (i.e. asking for more funding)

  • Social workers, like all humans, make mistakes when predicting future developments.

  • Machines can help us predict outcomes more accurately, that’s why we can call it artificial intelligence

    • Helpfulness of machine predictions increase, the more data we have

4.2 How do machine predictions work?

  • Basic technology: supervised machine learning

  • We need (a lot of) data about outcomes and determinants of these outcomes

    • “Supervised”, because we tell the computer what the outcome is and what the determinants are
  • Using prediction algorithms, the computer finds rules linking determinants to outcomes. These rule sets are a model.

  • There are simple and less simple prediction algorithms

  • Model is used to predict unknown outcomes with information on determinants

  • Prediction models are more useful for social work practice,

    • the more precise machines can predict the outcome, and the less biased they are.

    • the clearer it is what can be done to prevent the outcome

    • the more important it is to intervene early

4.3 Examples

4.4 How to train your own prediction model

  • Training a prediction model involves the following steps

    • Acquiring the data with past observations of determinants and outcomes

    • Split observations into training data and test data

    • Maximizing predictive performance

      • Measures of predictive performance

        • Continuous outcomes

          • usually mean squared error
        • Categorical outcomes

          • correctly classified
      • Play around with the choice of prediction algorithm

      • For algorithms that have parameters: play around

5 Kompetenznachweis

6 Introduction to R

6.1 General Information about R

  • R is free and open source.

  • R has an array of powerful statistical methods.

  • All additional tools can freely downloaded, installed and loaded as so called packages.

  • With ggplot2 R allows you to create beautiful figures.

  • With the tidyverse and dplyr, R has the simplest language for data preparation.

  • R is more than just statistical software (cf. shiny).

  • R is well known by ChatGPT.

6.2 Data Science workflow

# Install required packages if they are not already installed
required_packages <- c("readxl", "dplyr", "tidyr", "ggplot2", "officer", "flextable")
installed_packages <- installed.packages()

for(pkg in required_packages){
  if(!(pkg %in% rownames(installed_packages))){
    install.packages(pkg)
  }
}

# Load the packages
library(readxl)
library(tidyverse)
library(ggplot2)
library(officer)
library(flextable)

# Set the working directory
setwd("C:/Users/kld1/Downloads/")

# 1. Download Excel file
# https://www.pxweb.bfs.admin.ch/pxweb/de/px-x-1304030000_134/-/px-x-1304030000_134.px/table/tableViewLayout2/

url <- "https://www.pxweb.bfs.admin.ch/sq/ecfd5274-e21f-4d26-9bcf-5326af3edc9a"
destfile <- "sozialhilfe.xlsx"

download.file(url, destfile, mode = "wb")

# 2. Read and process data

# Read the Excel sheet (if multiple sheets exist, choose the correct one)
# Assuming the data is in the first sheet
raw_data <- read_excel(destfile, sheet = 1, skip = 2)  # Skip the first 2 rows containing metadata

# Process the data: Select columns, rename, filter rows
data <- raw_data %>%
  select(Kanton='...2', contains("20")) %>%
  filter(!is.na(Kanton), Kanton %in% c("Bern / Berne", "Zürich", "Basel-Stadt", "Genève"))

# Transform the data from wide to long format
long_data <- data %>%
  pivot_longer(
    cols = `2009`:`2022`,
    names_to = "Year",
    values_to = "Count"
  ) %>%
  mutate(Year = as.integer(Year),
         Count = as.numeric(Count))

# 3. Create a plot with ggplot2

# Create a nice ggplot graphic
plot <- ggplot(long_data, aes(x = Year, y = Count, color = Kanton)) +
  geom_line(size = 1) +
  theme_minimal() +
  labs(
    title = "Number of Social Assistance Recipients per Canton (2009-2022)",
    x = "Year",
    y = "Number of Recipients",
    color = "Canton"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
    axis.text = element_text(size = 10),
    axis.title = element_text(size = 12),
    legend.title = element_text(size = 12),
    legend.text = element_text(size = 10)
  )

# Save the plot as an image to insert into Word
ggsave("sozialhilfe_plot.png", plot = plot, width = 12, height = 8, dpi = 300)

# 4. Insert the plot into a Word document

# Create a new Word document
doc <- read_docx()

# Add a title
doc <- doc %>%
  body_add_par("Number of Social Assistance Recipients per Canton (2009-2022)", style = "heading 1")

# Add the plot
doc <- doc %>%
  body_add_img(src = "sozialhilfe_plot.png", width = 6, height = 4, style = "centered")

# Optional: Add a table with the data
# Create an example table (here the first 10 rows)
table_data <- long_data %>%
  filter(Kanton %in% c("Bern / Berne", "Zürich", "Basel-Stadt", "Genève"))  

ft <- flextable(table_data) %>%
  # Automatically adjust column widths to fit content
  autofit() %>%
  # Set table width to 100% of the document width
  width(j = 1:3, width = 1.5) %>%  # Adjust individual column widths if necessary
  set_table_properties(width = 1, layout = "autofit") %>%
  # Optional: Enhance table aesthetics
  theme_box() %>%
  fontsize(size = 10, part = "all") %>%
  bold(part = "header")  # Bold the header row

# Add the table
doc <- doc %>%
  body_add_par("Example Table of the Data", style = "heading 2") %>%
  body_add_flextable(ft)

# Save the Word document
print(doc, target = "Social_Assistance_Report.docx")
  • RStudio Environment

    • Console Window
    • Source Editor (Syntax window)
    • File Window, Plot Window
    • Environment Window, History Window

6.3 AI coding assistants

  • ChatGPT and other frontier Large Language Models know R pretty well (Copilot also works, but the newest models are more able)

  • After you ask a question, tell ChatGPT how your data look like.

  • If you have no sensitive data, just paste the data in to show ChatGPT the structure. If you have sensitive data, just paste the header (= variable names). With Copilot, data protection issues are smaller.

  • Paste the resulting code back into the R-Script and run the code

  • If you have errors, paste the error (from the console) back into ChatGPT and tell it to solve the problem.

  • Tell ChatGPT to only give you relevant code, if you adapt parts your overall code.

  • If it doesn’t comment code, ask to comment and explain what each piece of code does.

6.3.1 Exercise: throwing you in at the deep end

  • Open a new R script, copy the above code into it and save it

  • Copy the code above into ChatGPT or Copilot

  • Ask it to assist you while giving helpful and targeted advice, i.e. that it should tell you how to change the code. Try the following tasks:

    • Exercise 1: Add Luzern and Waadt to the plot and table
    • Exercise 2: Make the plot more beautiful by adding dots to the lines and by making sure every year is displayed on the x-Axis
    • Exercise 3: Develop a bar chart that displays the number of social assistance recipients for each canton for the year 2022 ordered by number of recipients. Ensure the chart includes appropriate titles and axis labels. Also ask it to label the bars with the values (with vertical alignment). Save the bar chart as a PNG file.

6.4 Reading in data

  • R allows you to read in data in all formats, including directly from the internet (see rvest).

  • The most common data storage format are Excel tables. You can open them with the readxl package.

  • The most universal data storage format is csv (comma separated values).

  • The best way to deal with large data are the data.table (to read in large csv-data-files) and the arrow packages (to save and read in large data).

#Set the working directory. Here we use the download folder

setwd("C:/Users/kld1/Downloads/")

#Download data to the folder by hand

#Büro: https://drive.switch.ch/index.php/s/gdNYHopxWDCV9hr
#Turnhalle: https://drive.switch.ch/index.php/s/am1T36ehPL24QuQ

#Install and load excel package
install.packages("readxl")
library(readxl)

#Read in data from the working directory

Buero <- read.excel("OJAOffice_Statistikdaten_Jugendbüro Oberburg 23.xlsx",sheet="Statistikdaten 2024")
Turnhalle<- read.excel("OJAOffice_Statistikdaten_offene Turnhalle 24.xlsx",sheet="Statistikdaten 2024")

#Explain objects, observations and variables
#Explain range und col_names = FALSE

6.4.1 Looking at data

  • RStudio allows you to manually scroll through data

  • This helps you better understand what is going on

#Explain what rows (observations) and columns (variables) are.
#You can either click on the object or...

#use View()
View(Buero)
View(Turnhalle)

#Or even fix data (never do this!)
fix(Buero)

6.4.2 Exercise: reading in data and looking at it

  1. Goal: read in data necessary for measuring the change of gender composition after introduction of mixed gender youth club in Summer 2023.
  2. For each file, what are the column names under which you find information on the date of the attendance, the number of attendees and the age and gender composition of the attendees?
Object name data should be saved with Year Sheet to read in and additional restrictions Source Link
Maedels_22 2022 Statistikdaten 2024 OJAOffice_Statistikdaten_Moditrff.xlsx
Maedels_23 2023 Moditräff Statistik OJA Angebote Burgdorf 2023.xlsx
Jungs_22 2022 Gieleträff OJAOffice_Statistikdaten_Gieltrff.xlsx
Jungs_23 2023 Gieleträff, range=“A11:B11”,col_names = FALSE Statistik OJA Angebote Burgdorf 2023.xlsx
JuBu_23 2023 JuBU Träff 5&6 Statistik OJA Angebote Burgdorf 2023.xlsx
JuBu_24 2024 Statistikdaten 2024 Copy of OJAOffice_Statistikdaten_Mittelstufentreff 24.xlsx

6.4.3 Solution: reading in data and looking at it

6.5 Simple data manipulation with dplyr

6.5.1 General

  • package dplyr by Hadley Wickham/Romain Francois offers a toolset for data preparation
  • See the dplyr vignette and the Data Wrangling Cheat Sheet for a very good overview
  • filter(): selects a subset of rows (see also slice())
  • arrange(): sorts
  • select(): selects columns
  • mutate(): creates new columns
  • summarize(): aggregates (collapses) data to individual data points
  • distinct(): removes duplicate values
  • group_by(): defines subgroups in the data so that mutate() and summarize() can be applied separately per group.
  • dplyr can be used very well together with so-called piping, i.e. the data object is passed from function to function by %>%, which makes the code much easier to read and more compact.

6.5.2 Example: Package dplyr

#install package

#install.packages("dplyr")

#load package
library(dplyr)

setwd("C:/Users/kld1/switchdrive/BFH/Wichtige Dokumente/Lehre/Data skills for social work professionals/BFH/Daten/Fokus Arbeit/")

focarb <- read.csv("FokusArbeit_Wirkung.csv")

#Select: select the variables Vitality1 (=measurement of vitality before the intervention), Vitality2 (=measurement of vitality before the intervention) and intervention (did the person participate in Fokus Arbeit or recieve standard counseling)
focarb <- focarb %>%
  select(Vitality1,Vitality2,Interventionsgruppe)%>%
#Filter: out observations that have a missing value (NA = not available) on the measurement before the intervention. Use logical operators to set the filter condition. Logical operators: 
#  is missing: is.na(), 
#  bigger than: >, 
#  smaller than: <, 
#  equals: ==, 
#  not equal to: !=, 
#  not: !, 
#  or: |, 
#  is element of: %in%, 
#  is infinite: is.inf()).
  filter(!(is.na(Vitality1)))%>%
#Mutate: calculate a new variable that measures the change in vitality before versus after
  mutate(Change_Vitality=Vitality2-Vitality1)


#Plot the distribution of the change for the two groups 

ggplot(focarb,aes(x=Change_Vitality,
                  fill=factor(Interventionsgruppe)))+
  geom_density(alpha=.5)

6.5.3 Exercise: simple data manipulation with dplyr

  • Goal: find out the share of social workers among the working population in Europe and in Switzerland.

  • Read in data from the 11 rounds of the European Social Survey with the following steps:

    • Use the password to download the data

    • Relocate the working directory (using setwd()) to your download folder or move the data to your working directory

    • Make sure you have the arrow package installed (otherwise use install.packages("arrow")).

    • Read in the data with the read_parquet() command.

    • Save it into the object ess.

  • Find out which variables measure the country of origin and the occupation of the respondent using the variable list. Hint 1: start searching from the top. Hint 2: Occuption is measured with the ISCO08 classification. For earlier years, it is the ISCO88 classification, but the data is reduced to years with the isco08 classification (after 2010). Use the variable list to find the exact variable names and labels.

  • Reduce the data frame to those two variables: country, ISCO08.

  • Filter out observations of individuals where information on ISCO08 is missing. The following values should be excluded:

    66666 Not applicable*
    77777 Refusal*
    88888 Don’t know*
    99999 No answer*
  • Create a new variable socialworker that measures whether someone is a social worker or not (click on the variable to know which numbers stand for social workers). Use the ifelse() function to define the variable.

  • Calculate the share of social workers in the whole data set using prop.table(table()). How many social workers per 100 working people are there in Switzerland?

  • Reduce the data frame to people from Switzerland and repeat. Are there more social workers per 100 people in Switzerland than in total Europe?

6.5.4 Solution: simple data manipulation with dplyr

6.6 Merging and reshaping data

6.6.1 Rbind, cbind

6.6.2 Merge

6.7 Dealing with text and dates

6.7.1 Regular expressions –> ChatGPT

6.7.2 Grepl

6.7.3 Gsub

6.8 Univariate analysis

6.8.1 Frequencies and distributions

6.8.2 Mean, median, mode

6.9 Analysing associations

6.9.1 Frequency tables

6.9.2 Grouped mean

6.9.3 Linear models

6.10 Nice tables

6.10.1 Huxtable

6.11 Nice graphs

6.11.1 Ggplot2

6.12 Workflow

  • Save graphs as png and link them into word

  • Save tables as docx and link them into word